🔈 Audio classification (with data aug)

Transormation pipeline for Train dataset

Read audio wave from filepath
1. Read wav file (tf.io.read_file)
2. Decode wav file (tf.audio.decode_wav)
Remove silence from the begining and the end (tfio.audio.trim) (OPTIONAL)
Limit audio to a fixed number of seconds
- Sorter audio –> Pad the end with zeros
- Longer audio –> Random crop
Data augmentation over audio wave
- Change Speed
- Pink noise
- Gaussian noise
- Gaussian SNR
- Gain (Volume Adjustment)
Convert audio to MelSpectogram
1. Convert audio to spectogram (tfio.audio.spectrogram)
2. Apply the Mel scale (tfio.audio.melscale)
3. Apply the DB scale (tfio.audio.dbscale)
Data augmentation over MelSpectogram
- Time Warping (tfa.image.sparse_image_warp) (from the SpecAugment paper)
- Time Masking (tfio.audio.time_mask) (from the SpecAugment paper)
- Frequency Masking (tfio.audio.freq_mask) (from the SpecAugment paper)
- Mixup
- Any other image transformation
Add the coordconv channel (OPTIONAL)
Normalize (standard scale)
- Apply the correct mean and std if transfer learning